Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add filters

Language
Document Type
Year range
1.
Dissertation Abstracts International: Section B: The Sciences and Engineering ; 84(6-B):No Pagination Specified, 2023.
Article in English | APA PsycInfo | ID: covidwho-2301457

ABSTRACT

Interacting with computer systems with speech is more natural than conventional interaction methods. It is also more accessible since it does not require precise selection of small targets or rely entirely on visual elements like virtual keys and buttons. Speech also enables contactless interaction, which is of particular interest when touching public devices is to be avoided, such as the recent COVID-19 pandemic situation. However, speech is unreliable in noisy places and can compromise users' privacy and security when in public. Image-based silent speech, which primarily converts tongue and lip movements into text, can mitigate many of these challenges. Since it does not rely on acoustic features, users can silently speak without vocalizing the words. It has also been demonstrated as a promising input method on mobile devices and has been explored for a variety of audiences and contexts where the acoustic signal is unavailable (e.g., people with speech disorders) or unreliable (e.g., noisy environment). Though the method shows promise, very little is known about peoples' perceptions regarding using it, their anticipated performance of silent speech input, and their approach to avoiding potential misrecognition errors. Besides, existing silent speech recognition models are slow and error prone, or use stationary, external devices that are not scalable. In this dissertation, we attempt to address these issues. Towards this, we first conduct a user study to explore users' attitudes towards silent speech with a particular focus on social acceptance. Results show that people perceive silent speech as more socially acceptable than speech input but are concerned about input recognition, privacy, and security issues. We then conduct a second study examining users' error tolerance with speech and silent speech input methods. Results reveal that users are willing to tolerate more errors with silent speech input than speech input as it offers a higher degree of privacy and security. We conduct another study to identify a suitable method for providing real-time feedback on silent speech input. Results show that users find an feedback method effective and significantly more private and secure than a commonly used video feedback method. In light of these findings, which establish silent speech as an acceptable and desirable mode of interaction, we take a step forward to address the technological limitations of existing image-based silent speech recognition models to make them more usable and reliable on computer systems. Towards this, first, we develop LipType, an optimized version of LipNet for improved speed and accuracy. We then develop an independent repair model that processes video input for poor lighting conditions, when applicable, and corrects potential errors in output for increased accuracy. We then test this model with LipType and other speech and silent speech recognizers to demonstrate its effectiveness. In an evaluation, the model reduced word error rate by 57% compared to the state-of-the-art without compromising the overall computation time. However, we identify that the model is still susceptible to failure due to the variability of user characteristics. A person's speaking rate, for instance, is a fundamental user characteristic that can influence speech recognition performance due to the variation in acoustic properties of human speech production. We formally investigate the effects of speaking rate on silent speech recognition. Results revealed that native users speak about 8% faster than non-native users, but both groups slow down at comparable rates (34-40%) when interacting with silent speech, mostly to increase its accuracy rates. A follow-up experiment confirms that slowing down does improve the accuracy of silent speech recognition. (PsycInfo Database Record (c) 2023 APA, all rights reserved)

2.
Neural Netw ; 142: 316-328, 2021 Oct.
Article in English | MEDLINE | ID: covidwho-1392462

ABSTRACT

Recently, tracking models based on bounding box regression (such as region proposal networks), built on the Siamese network, have attracted much attention. Despite their promising performance, these trackers are less effective in perceiving the target information in the following two aspects. First, existing regression models cannot take a global view of a large-scale target since the effective receptive field of a neuron is too small to cover the target with a large scale. Second, the neurons with a fixed receptive field (RF) size in these models cannot adapt to the scale and aspect ratio changes of the target. In this paper, we propose an adaptive ensemble perception tracking framework to address these issues. Specifically, we first construct a per-pixel prediction model, which predicts the target state at each pixel of the correlated feature. On top of the per-pixel prediction model, we then develop a confidence-guided ensemble prediction mechanism. The ensemble mechanism adaptively fuses the predictions of multiple pixels with the guidance of confidence maps, which enlarges the perception range and enhances the adaptive perception ability at the object-level. In addition, we introduce a receptive field adaption model to enhance the adaptive perception ability at the neuron-level, which adjusts the RF by adaptively integrating the features with different RFs. Extensive experimental results on the VOT2018, VOT2016, UAV123, LaSOT, and TC128 datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of accuracy and speed.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Perception , Attention
SELECTION OF CITATIONS
SEARCH DETAIL